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ABSTRACT 

Issues related to achieving adequate national norms 
are reviewed r and a new methodology is proposed that would work to 
provide a true measure of nr^tional achievement levels on an annual 
basis and would enable reporting results in current-year norms. 
Statistical methodology and technolc^y could combine to create a 
national norming process that would publish an annual national norm 
soon after test users complete their regular annual testing programs. 
This approach would be supplemented by a small national normative 
sample and adjusted by a factor to reduce the influence of users* 
This approach would also include almost immediate turnaround of the 
current-year norms to allow schools to report their annual scores 
using those norms. Schools already giving a test would transmit their 
data electronically to a central location. Schools in the normative 
sample would test and transmit their data. A current year norm table 
could then be produced. This type of norming procedure would probably 
require a national center for test norming, as well as cooperation 
among test publishers, cooperation from school districts would be 
secured by the Xact that they could have national norms within weeks 
of transmitting their raw data. It should ultimately be possible to 
establish a national educational achievement indicator to rival 
Scholastic Aptitude Test scores—an indicator with the simplicity of 
the Dow Jones average or the Consumer Price Index. Nine figures 
illustrate the discussion. (SLD) 
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If the VS. Government Can Tell Us the 
Consumer Price Index for February in March, 
Why Do Wc Have to Wait Years for 
TVue National Norms? 

OVERVIEW 



Testing practitioners, researchers, law n^akcrs, parents, 
counselors, and test publishers arc among those who are 
di<;satisned with national norms for siamiardized achieve- 
ment tests. Educators are tiring of being accused of 
purposely using old lest norms to deceive the public into 
believing that the public schools are doing a better job than 
ihcy really arc. "How Public Educators Cheat on Stan- 
dapdized Tests" (Cannell, 1989) is an fortuitous blundering 
by a crusader onto some very real shortcomings of stan- 
dardized test norms and how educators misrepresent them 
in their reporting. "Does nationally' normcd really mean 
nationally?" (Bagiin, 1980) is a carefully constructed dis* 
section of a basic flaw in national noonings — the over 
influence of users in norming samples. Test publishers 
tried 10 deal with BagUn^s criticism by offering incentives 
other than credit towards the purchase of their own tests or 
texts to schools for participation in norming. Test i»iblish- 
ers tried to deal with Canneli's issues by creating annua! 
user nomis. Neither solved the problem. 



Tills p^r reviews the issues related to achieving udcquaic 
national norms and proposes a new methodology iliai 
would work to provide a true measure of Tuitional 
achievement levels on an annual basis and wittiin a 
timeframe that makes reporting results in currcnt->car 
norms possible. Yes, this sounds loo good to be true, but 
in reality, advances in computer technology and a few 
simple lessons learned from, of all souacs. \l)c Federal 
govemmeni make this new methodology achievable in 
the very .short tenn. 

A review of recent articles and publications identified w h> 
current norming methodology is flawed, w hat is required 
for an ideal annual norming. and what is now practical lor 
a functional annual norming. 



Why are our current test norms inadequate? 



O They arc old the first time they are used. 

0 A true national normative sample is virtually 

impossible to achieve. 
© Annual norms are overly influenced by users with 

curricula matched to the test objectives. 
& Students taking the tests in the norming sample 

are unmotivated. 
® Test administrators in the norming sample may or 

may not follow the standardized procedurcsclosely. 



® Students eligible for the norming may be included 
or excluded using different rules than in the 
actual testing; makeups may or may not be 
admini^ered similarly. 

0 TestsarencHmcdonlyonce or twice duringa school 
year requiring interpolation for missing months. 

© Test levels are normed within a limited range 
of grade levels requiring extrapolation for 
vertical scaling. 



What would be ideal? 



A national normative sample with full parUcipahan by 
motivated students, with the same rules for inclusion 
or exclusion and make-up testing: students tested under 
true standardized conditions—each nwnth of the school 
year, every year—in a wide range of grade levels for each 
test level. 

Impossible. However, there is a compromise alternative 
that could work. Statistical methodology and technobgy 
combine now to offer an alternative for a new national 



norming process that would create an annual national 
norm published soon after test users complete their regu- 
lar, annual testing; supplemented by a small national 
normative sample and adjusted by a factor to reduce the 
influence of the users. This appro^h must also include 
ahnosi immediate turnaround of the current-year norms to 
allow schools to report their annual scores using those norms. 

This paper details how this new app^oach can work. How 
schools can participate in this annual, or more frequent 
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norming, over an electronic network thai will allow dis- 
iricLs and states to crnitnlnitc to the norming ainl icceivc 
new noims tables for reporting local scores within a rea- 
sonable time pericxl after testing. After all, the national 
statistics for unemployment, inflation, gross national prod- 
uct, etc, by no means take as long as test publishers now 
take to release national norms. This papc^ cc^iuasts the 
procedures used togathcf and report these national siati^cs 
with the proposal for a new national norming system* 

Schools which are already giving a test would transmit 
their data clecr^nically to a central location; schools 
selected in Uie national normative sample would test and 
transmit their data. Using apprc^riate statistical combi- 
nations and adjustments, a current-year norms table would 
be produced and transmitted to local ^hools; schools 
could then convert their scores to the current year percen- 
tiles and other derived scores for reporting. 

Many sampling issues currently make {^ctpating in a 
national norming sample difficult for schools. Publishers 



typically want students to take an entire battery uf tests in 
order to have a realistic testing situation and in order to 
calculate composite and total scores. However, a cumpro- 
mi^ ad this jMXXx^durc could provide current norms lur 
individual tests if we are willing to sacriftce having norms 
for every total and compc^itc score every year. Individual 
students could take a single test during the adminisuation 
of iteir regular achievOTient testing, rather than the cniirc 
battery. Indeed, this adds to die potential for testing 
fatigue, and tfiis must be considered in the schc^tuling by 
individual districts. 

Unfortunately, this type of nomiing procedure would un- 
doubtedly stretch the planning and computer programining 
resources of test pubU^ers* Thercf(»e, wc must entertain 
the possibility of creating a national center for test norm- 
ing — possibly a joint venture among test publishers or a 
governmental center supported by public funds. Yes, 
public funds, Aflcrall,lheneedforcunrenlandcomparable 
norms is a matter of public interest that apparently cannot 
be met with the resources available to individual districts 
or even to individtial publishers. 



What is the theoretical impact of 
biased national norms? 



Five types of norms have been identified — annual triK^ 
national norms, point-in-time true national norms, point- 
tn-time user-influenced norms, annual user norms, and 
annual user-influenced norms. The dificnwices between 
the scores that would be reported given the use of each of 
five types of norms will be described. The resulting 



averages that would be rcponed for a typical school district 
under various combinations of a kxal disiric t*s impro v ing, 
staying the same, or declining; and the national average 
improving* staying the same^ or declining over a six-ye^u- 
period are discussed. 



A Call for a New National Norming Methodology 



A skeptic might look at this proposal and thirdc that the 
mechanics of annual norming are too cumbersome to be 
accomplished quickly; however, current computer tech- 
nology and a little creative, advance programming can 
indeed create a system for generating national norms 
within a reasonable time of the testing. 

This is a fascinating time to work in the achievement 
testing indusuy in the U.S. While in many states such as 
Texas, tests arc very popular with the legislators, the term 
authentic assessment is buzzing around everywhere. The 
advocates of authentic assessments as replacemwts for 
traditional multiple choice tests have not yet produced a 



saleable product that can be adopted by a state, so most oi 
the ^tion in the authentic assessment arena is at the local 
or naiicmal level. 

When, in 1980. Roger Baglin challenged the sampling 
used for national norming, publishers agreed that their test 
users and their text uscts m^c up the jwcpondcrancc i)l 
tteir national norming samples. However, it look an 
aspiring psychiatrist, John Cannell, tochallengc how every 
state in the nation could be above those national averages. 
Some of us thought that the testing establishment would 
have to make major changes and admit that we were sliding 
along taking advantage of a few psychomeuic loopholes to 



took good; however, il you saw the fall 1990 issue of 
Educational Mcasurcmeni Issues and Pr&^lice, you saw a 
grealanicleby Lome Shepard whocitcd Phillips and Finn, 
Dr^ozal and Frisbie, Lcnke and Kcene, Williams, QualLs- 
Payne, and StonehiU who all lo $omc dcgtcc combined lo 
prove ihai indeed il is possible for every slaic to be above 
ihc national average. 

We have been fasc inatcd by ihc lack of suggcsiions of how 
10 solve ihe real problem— the l^k of true annual norms fa 
our standardized tests. In f;^t, we are (ascinated by the 
apparent fact lhat throwing out multiple choice tests and 
embracing alternative assessments is much more pojHilar 
at this time than is the notion of improving the multiple 
choice tests. 

The bouom line though, is that the testing esiablidimeni 
has done precious little since CanncU began challenging us 
to ixiispond— other than to aitcmjH to produce annual user 
norms. Our conclusion after studying annual user norms is 
that they are better than most people think they are, and we 
will discuss that in more detail bter* 

The major impact we see coming from Cannell's criticism 
has been the movement toward authentic assessments. 
H.D. Hoover, author of the Iowa Tests of Basic Skills, has 
away of simplifyingcomplex issues. At the annual meeting 
of the Southwest Educational Research As^tation in 
January Hoover quoted his favorite <fefinition of authentic 
assessment as anything other than a standardized test 

President Bush*s Education Policy Advisory Committtx: is 
planning a new type of national examination that could be 
in place in the next few years (Rothman, 195K) and 1991). 
The National Crater m Education and the Economy has 
called for a national standard for all students based upon a 
scries of assessments. They have joined with the Learning 



Research and Development Center at the Ur^ncrsily t)t 
Pittsburgh to secure S2,5 million in grams lu have in pktv\ 
an exam system by the year 2000. 

Our rcacuon to this flurry of activity to abandon ih< 
tradilicmal standardized achievement test is — full spccit 
ahead, sec what you can do. However, we have rcscrva- 
tionsabouttheir chances for success* The mainrcscrvaiiun 
is that we strongly suspect that some of tiic criticism dI 
standardizedtcstscomesfrompcrsons who want topuUoui 
from under the burden of accountability. Changing lo ;i 
different— even beuer— -style of accountability is mi wh;ii 
those critics want — they want an end to testing of any Kimi. 
Other critics have a sincere desire to make as.scssn)cfU 
duplicate real applications as much as possible. One of iho 
most troubling pn^tions about authentic assessments is 
that, at least initially, the performance gap between ethnic 
groups will be wider than on standardized tests. 

Secondly* every alternative we have heard so far Ibr 
authentic assessment lakes more time, costs more moncs , 
and is less reliable across locations than what wc have now 
with multiple choice, standardized tests. 

Finally, these national efforts must first solve the issue ot 
local conuol over cuiriculum. Just like N AEP, a national 
effort mu^ negoriale a ccmsensus as to what should be 
measured. This may not be impossible, but it will he 
difficult and require some time. 

!n the meantime, the authors arc unwilling to abandon ;i 
system of standardized achievement testing that has scr\cd 
us reasonably well, but is suffering from a midlife crisis in 
norming. We would prefer to solve the norming crisis. 



What's wrong with current norms and 
accountability in education? 



"The condition of education*" is a phrase wc often hear from 
people from all walks of life, from casual conversations to 
technical journals and government official reports. Con- 
tradictory reports arc published almost daily. Do wc really 
know what is happening to the ^hievcment level of the 
children in this coun&y? Are the schools telling the iniih? 
Do college entrance exams and achievement test scores 
really indicate tfcnds in achievement? These two indica- 
tors seem to tell two contradicting stories* SATscorcshavc 
not improved ov(»all — although the percentage of high 
school graduates taking the test continues to go up, allow- 
ing students other than the cream of the crop the opportu- 
nity to go to college* Achievement scores, as Canncll 



indirectly points out, are going up. If usmg old norms for 
interpreting current achievement makes schools look bel- 
ter than if they used new norms, it follows that norms arc 
tougher now because overall scores have 
increased 

TTie fact lhat public education, which requires a ycarl) 
national investment of over two billion dollars, has flawed 
national accountability systems in these 'high-tech* times 
is inconceivable. In the context of national expenditure b> 
all governments, education represents approx imatcly 1 4 , 
ccMnpared with health and social security (19%), social 
welfare (16%), transportation (5%), defense (15%), and all 
other areas of government (31 %), 



In light of this, our interest in educarion shmiid be twofcdd. 
as an economic investment and as ihe future of our youth. 
We need to know how well specific programs work, but we 
also need much better general accountability for our tax 
dollar investment 

Other areas of government* particularly those more 
directly involved with the economic issi^s, have devel- 
oped sophisticated means of gathering and analyzing data 
to obtain trends and indicatcH^ that allow decision makers 
to steer the nation toward progress and a more 
dcjurablc future. 

The ConsumerPrice Index iscalculated monthly. Tl^data 
ajt^ collected through the ongoing point-of-purchase Con- 
sumer Expenditure Survey. Tliis survey is a»idacted 
through visits or calls by trained field representatives from 
the Bureau of Labor Statistics. The survey includes ap* 
proximately 2 1 ,(XX) rciailand service e^blishmcnts,40,000 
tenants, and 20,000 owner-occupied housing units in 91 
urban areas. Consumer Price Index data arc issued every 
month about three weeks following collection. The Con- 
sumer Price Index Detailed Report, available about three 
weeks after the initial release* provides detailed indexes 
and a monthly analysis of U.S. price movements. There is 
also a semiannual report of recent price movements as 
well as long-term txcnds. 



In education, however, noi only is the picture quite nebu- 
lous, but it tak(» a very long time to come out. Indicatcus 
of how the nation is dobg in the educational arena take ai 
Ivst six months at the college entrance exam level. SAT 
scores published in September include scores of students 
tested in March. Api»t)ximately one and 
a half year to two years pass at the 
elementary and ^^condary school level 
before we see test results based on 
^'current'' national norms. National nc^ms 
provided by the test publishing companies 
take one year to be calculated and a year 
and a half to two years before the norms 
are used by local and state education agen- 
cies to interpret test data 



In addition to the delayed availability of 
national norms, the norms are not true 
national norms. Publishers struggle to get 
random stratified samples that an^ repre- 
sentative of students in the country but 
often theymustsettlefor their third, founh, 
or fifthchoices of schools because there is 
not much cooperation from school dis- 
tricts to participate in norming studies. 
The norms derived this way often are 



The Dow Jones Average, on the 
oiher hand, is updated daily based 
on ongoing electrcMilc transmission 
of data. The price of each particular 
stock is updated continuously 
throughout the day, and this infor- 
mation is available worldwide. An 
investor has access to this informa- 
tion throughout oxh work day and 
can make decisions on wether to 
buy or sell stock. 



Figure 1 
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based in groups ihal arc users of ihc lest scries and/w ascrs 
of ihc icxlbooks and support malcrials published by ihc 
publishing comi^ny ihai publishes the icsi being normcd. 

The trend among test publishing companies is lo produce 
user norms annually or biannually* huu as mwilioncd 
earlier, ihc norms do nol become avaibblc until much after 
the test is administered* and by the lime sch(K>I districts can 
actually use lhcm» they arc already old norms. Also, 
publishers only report the norms for Ure icsi or icsls ihcy 
publish. Cuncnily t studies equating the different achieve* 
mem tests arc not available. Each test is like a piece of a 
puzzle and the pieces do not fit together. 



Over $80 million have b«m suggested for the nex t ptusc o\ 
NAEP aiKJ ihcrc ^ plans to develop a national test tlua 
would include mrhievcmeni test scores as well as porilol ios 
and otherpcrformancc indicators. Why not t^e advanu^c 
of an efTon that is already in pbce and working relatively 
well? U can work a \{A bciict for ihc disuicis using ii, a> 
well as to provide a national educational indicator ih:U is 
timely for making decisions that would affect educaiion ;u 
the local and naiional levels. 



What's gCKxJ about current national norms 
and standardized tests? 



❖ They provide a national context for judging local 
performance. 

❖ The nOTns are based on very large samples that 
meet many of the assumptions required. 

❖ They arc relatively quick and inexpensive to 
administer. 

❖ They apparently work to some degree because we 
continue to use them. Chapter 1 inquires them, and 
the allcmauvcs have yet to be proven. 

Throughout ll)e national debaic on the influence of oid 
norms and users on norms, it has been a suugglc to 
understand exactly bow all Uiese factors work together to 
give districts either an advantage or a disadvantage. To 
help illustrate those dynamics, several theoretical situa- 
tions have been developed. 

Taking a very conservative approach, the following as- 
sumptions were defined. 

❖ If one assumes that a test user nets an advantage 
each year through familiarity with the test, then the 
minunum advantage gained is -^I pcfcenulc annu- 
ally. 

4 If ore assumes that the user of a publisher 's text nets 
an advantage each year through familiarity with the 
lest contents, then the minimum advantage gained is 
-¥1 percentile annually. 



♦ If a new test were to be used each year, then these 
advan^ges for a user would be nullified. 

♦ A user in a user- influenced norm group who is ai ilic 
true 50ih percentile nationally would outscorc 
nonuso^ in the norm group and tie other asers« thus 
netting some average of the two 1 perccniilc point 
advantages for users and would score at the 3! si 
percentile rather iltan the 50ih percentile. 

Figures 24 show the theoretical ouiccwncs from six years 
of testing for a district using various assumptions about iho 
achievement uends across those years. 

Now in our first example. Figure 2, we have a disirici it)»ii 
is a test user, a text user, is at the naiional average in > car 
one, maintains that same level of achievement over ihc six 
years, while users and nonusers do the same national ty. 

The influence of all these factors results in a maximum 
difference of 1 1 percentile points in year 6 across the 10 
ccmbinations of norm types and tests shown. Thai is a 
major influence, using very conservative assumptions. If 
you believe that an 1 1 percentile point bias is higher ihan 
reality, then maybe those conservative assumptions were 
not so conservative at that. In thatcasc, the influence ol ihc 
user factors described in the assumptions would be nuiro 
minimal than some may think. 

Interestingly, the combinations that most closely match the 
results from an annual true national norm using a new test 
each year is annual user norms using the same test each 
year or a new test each year. Both yield a pcrceniilc ol 50 
in year 6. 
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Figure 2 

Achievement Trends: Local Maintaining, National Maintaining 



Assumptions 



Local School District 

• Text User ( -^J %ile each year same lest is used) 

• Tcsi User ( '^I %ile each year same test is used) 

• At 5Ckh %Uc in Norming Year, Tnic National Norm 

• Maintaining Same Achievcnient Level Annually 

Users Nationally 

• Average at SOtb %ile. True National Norm 

• Follow Same Achievement Treml as National Trend 

National Achievement 

• Maintaining Same Achievement Level AnnuaUy 
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Our second example in Figure 3 diff^s from the first only 
in thai the local and national avoages are going up 2 
percentile points annually. In this contcxu an annual u^ 
norm is also as accurate as a true national norm when a new 
tost is administened each year. Few districts do thaU so the 
more important flnding is thai the annual user norm tracks 
an annual true national norm whether (»r ncH a new test is 



given each year. The most distorted view conges from a 
point-in-timc uscr-infiuenced norm— which isexactly whai 
most districts use for reporting. 

Our third example in Figuie 4 illustrates the situation when 
local and national achievement is declining at a 2 percentile 
point per year rale. The dynamics arc that the local user 



Figure 3 

Achievement Trends: Local Up, National Up 



Assumptions 



Local School District 

• Tcxl User ( ^1 %U€ each year same iesi is used) 

• Tesi User { Hkile each year same test is used} 

• At 50ih %ilc in Nonning Year, True NaLional Nonn 

• Gaining 2 %ile points from Norm Year Annually 

Users Nationally 

• Average at 50ih %ile. True Noiional Norm 

• Follow Same Achievement Trend as National Trend 

National Achievement 

• Gaining 2 %ile points from Nwm Year Annually 
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district is advantaged by ihc use of the same test each year, 
the losses over time are masked along with the fact that 
local achievement is cracking the national average. 

If we were to write a handlxK>k of ho w lo li^ national nomis 
to rank high on standardized tests, what wotild we say? 

If national achievement is remaining stable, and you are a 
text/test uscr^ then there is no real advantage or disad van^ 
tagc in annual uscx norms. The big advantage comes when 



users give the same test annually and compare themselves 
to the old. point-in-time national norm— cither a true 
national norm or a tiser-influenced one. 

If national achievement is going up. apoint-in-timc, user 
influenced norm is the most advantageous. This confirms 
lhat the test publishers have been on lo a great marketing 
strategy over the last two decades. 



Figure 4 

Achievement Trends: Local Down, National Down 



Assumptions 

Local School District 

• Texl User f / each year same teja is used) 

• Tcsi User f %ile each year same (est is used) 

• At SOih %ile in Nonning Year, True Natiu»ial Nonn 

• Ix^ing 2 %ilc points from Norm Year Annually 

Users Nationally 

• Average al 50th %ile. True National Norm 

• Follow Some Achievement Trend as National Trend 

National Achievement 

• L«)Sing 2 %ile points from No?m Year Annually 
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Influenced Norms s»m>T«iB^vi.r 51 52 53 54 55 56 




If national achievement is going down, an annual true 
naiional norm a user norm would keep a lypical disirici 
at the national average. The worst case wouM be 10 use the 
old point*in-Ume national norm, because youir district 
would fall behind the artificially statue national norm. 
However, if the downward trend is equtvaleni to the 
advantage gained from being a testAext user who gives the 
same test annually^ the loss in achievement is 
effectively masked. 



The point of all this is to confirm that the best norm h 
anannual norm — based either on a true national sample 
or on a user sample (if you are a user also). 

However^ we need to acknowledge a pitfall of an annual 
ncHm. Imagine that your distxict is soaring ahead and 
making tremendous gains. At the same time, the national 
average is going up. You have told your superintendcni 
andboardoftrusu^es about thescgreatgains.andihcy have 
asked you to provide a graph to illustrate them. 

8 to 



Your dUuicl*s longitudinal graph would look like Figure 
5— flai. This is ihc simtlar lo ihc dilemma wc f^c when 
parents and teachers of gifted siudenls complain that iheir 
students scoie at the 99ih pcrccniilc every year and show 
no progress -Ihcy never can go up. 

What this conTirms is that there is a legitimacy in using a 
point-in-time national norm. As Figure 6 shows» setting a 
baseline year for future comparuwns provides an indica- 
tion of the trend over time. The problem wc have had with 
critics like CanncU is that wc have notcommunk:atcd that 
legitimacy and have given the impr<^sion that wc are 

Figure 5 

Local Average Represented in 
Annual Ncmns 
(Local and National Achievement Rising) 

99 



^5 

K 

N 50 

T 
I 
L 



25 



2 3 4 
Year of Testing 



making comparisons to current national achicvcincni lev 
els* If wc were to conduct a local surx cy of the price i»l a 
gallon of gas, then compare that to the lutiional avcrauc 
price frwi two years ago and claim that kKal pncc:* aic 
cheaper, mostcitizens off the suvct would rc;tli/,c \\ c 
not oiranting with a full lank of gas. However, that »s 
exactly what wc do when we use point-in- time test nonns. 

Comparing to a base year is legitimate; however, wc must 
communicate to the audiences that is what we are doini: 



Figure 6 
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What is required for a true and limcly national norm? 



❖ Participation by a sample that uiily represents stu- 
dents across the naticMi 

❖ Speedy results 



This docs not mean results the next year, but results last 
enough to use in the current yearns report. Yes. if we lesi 
in April, we should be able to report those scores in current- 
year percentiles during the summer. 



Why do we think this can be accomplished? 



> Scoring of tests by disiricis and coring services has 
improved dramatically. Most districts gel 2 - 4 week 
turnaround from coring services, and districts with 
in-housc scoring accomplish the task even quicker. 
In Austin, the testing staff collects answer sheets by 
the Friday after testing and delivers results to the 
schools on Monday morning. 

♦ To get the scores to a central location for calculating 
the norms, we now have national electronic net- 
works. 



ERIC 



An example of a somewhat similar communicaiions issue 
that is being addressed through electronic networking is 
the National Center for Fduciitional Statistics (NCTS) 
Inu^rstatcStudentRccordTransfer System (ISRTS), hicb 
is currently in a pilot stage. This systent is setting a 
fiational standard for formatting student information and 
sending it across networks between public schools, be- 
tween public schools and institutions of higher education, 
and bctwwn institutions of higher education. The goals 
include decreasing the time required U) transmit records 
and reducing the costs for sending paper records that mu^t 
be reentered into the new school's computer files. 

II 



♦ With today's computers and with some advanced 
programming, iJk* number crunching required can 
be accomplished quicker. 

♦ Those same clcciionic networks can distribute the 
new numis tables back to schooi districts. 

In Roger Baglin's 1980 paper on silf sclcction bias in 
national norms, he raised the notion of a joint norming 
effort by the test publishers. There are restraint-of-trade 



laws that would kill such cooperation. 1 iowcvcr, ilie cvsi 
for a national clearinghouse funded by the federal 
government would be much less than the cost for N AtP or 
any ctf the c^er national efforts being planned. 

There would be less costs for test dcvclopmcni, less for 
additional te^ to be printed, and less for scoring scn iccs. 
With IRT and otter equating methods, local choice of tests 
could continue, and we could still make comparisons 
across states. The problems encountered in die 1 970 
aiKhor study could not be ignored, but coukl be addressed. 



National Achievement Indicator and 
Norming Program 



The proposed solution to the timely norming fS'oblcm and 
the cn^ation of a national educational indicator could be 
obtained by centralizing the creation of annual norms for 
the major standardized achievement tests. This could be 
done through the tise of electronic transmission of data and 
the voluntary cooperation of the districts using these tests. 

Cooperation from school districts would be secured by the 
fact that they would get current national norms within 
wccJcs of transmitting their raw data. Currently, districts 
arc reluctant to participate in norming studies because that 
implies additional testing-either a new version of the test 
they are currently tising or a complete battery of a test they 
aie not using and do not intend to adopt in the future. With 
this system, the districts would be requiied to administer 
one extra subtest to each student in addition to the test they 
normally administer. 



Currcndy, a national norming study requires 10O,U(K} id 
2(X),000 students uiking the test (including grades K- 1 2). 
With the proposed program, a norming group could be 
much larger, as illustrated in Figure 7. An additional 
subtest would be assigned by the norming center based on 
stratified sampling methods that would distribute the 
subtests in such a way that the sample size and makeup 
would povide a valid norming group. 

In the example presented in Figure 7. die norming center 
would select 2(X).(XX) test users to be included in the 
norming sample. Four sub-sets of 50.000 students each 
would be administering subtests from four other tests. 
Within each subset of sttjdenis, five groups of 10.000 
students each would be administering diiTerent subtests of 
the aligned test 



Figure? 

Sampling Paradigm for National Nomis 
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The additional subtest would be used for equating pur- 
{X)sc5, thus linking togethcrall the tests to create nmns that 
arc not user infiuciKcd. One of the sacrifices that could be 
made to make the system functional is mulliycar norms for 
total and composite scores. Mo^ tests combine various 
subtests inio these composites in order to give users fewer 
scores to deal wiih in reporting and electing students f(ff 
programs. In reality thosecompositcs require that students 
in the norming sample take all tests included, bccau^ the 
composites arc more than the tntK arithmetic average of 
subtests. However, with |»$t experience on oib^ side, a 
national center could estimate composite score norms from 
individual test norms. This would allow for a testing 



schedule that requires individual students u> take tcwcr 
subtests each year. Periodic testing with a lul! I\jiut> 
could be done at less cost and less disruption oi insirucuoiK 

After test administration in spring and falK districis would 
process ll^ tests in the way they an^ accustomed to (cuhcr 
in*house scanning* using a scanning service, or the scrv twos 
offered by the test publishers). The data would t>c 
electronically transmitted to the norming center and^wiihtn 
approximately a mcmth» they would receive the current 
national norms with which to interpret their current 
year's achievement. 




Figure 8 

Flow of Information: National Center 
for Achievement Test Norming 
and Data Clearinghouse 




TAfiUES 



/_^__. J 



\\DEMOaRAPi€C\ 
RAW scoffs/ 



I CREATE CWVERTHO 




SCORES 




OE. NCE. 




STAf«N£. etc 





EOUATIMG ^ 
SAMPLE 
DEMO IftfOA 




Ihe notion of a national achievement indicator to rival 
SAT/ACT scores is a dream of NAEP and Uic newer 
national u^siing movements. However^ ite simplicity of 
the Dow Jones' Average, the Consumer Price Index, or tlte 
SAT/ACT averages is a real challenge* A national 
achievement indicator could be a composite across all 
tested areas or merely be two representative areas such as 
reading and mathematics. Rememb^* simpticity is the 
key, so having separate numbers for all subject areas 
detracts from the utility of the indicator, A simple oplicMi 
is to create a scale with lOO being the national average for 
abase year andachievementtestscoresbeing convened to 
this scale annually. 



There are two options for the norming center. One is the 
most conservative (Figure 8), having the center be only a 
normingcento* that collects raw scoresand dcnK>graphic dau 
and returns tables to the districts for score inicrprcuuicm 

In the second option (Figure 9), the center is a national 
scoring service and norming c^jntcr, where the districts 
would transmit item responses and demographic infomia 
tion fw each student. The center would score the tcsiji. 
transmit the item rcsix)n^ to the test publishers for 
research purpo^» and electronically transmit students* 
converted scores back to the districts within a few weeks 

The first option has the advantage of a tower cost and 
simplicity of opo^tion. 
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Figure 9 
Flow of Information: 
National Center for Achievement Test Scoring. 
Norming Onter, and Data Clearinghouse 




Conclusion 

School sysicms arc spending milUons of dollars for norm- referenced ^hicvemcni tests, but we do not have true national 
norms. Now is the time to take advantage of existing icchncdogy to give us timely, true national norms. 

Together* we should work toward the day when a stale legislator, cr a governor, or even the education president wakes up 
in the morning and checks the papa for that day*s Dow Jones* Average, the Consumer Price Index, and of course, the 
national achievement level and says, "^Wow, it was a good week for public education."* 
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